Representative Subgraph Sampling using Markov Chain Monte Carlo Methods
نویسندگان
چکیده
Bioinformatics and the Internet keep generating graph data with thousands of nodes. Most traditional graph algorithms for data analysis are too slow for analysing these large graphs. One way to work around this problem is to sample a smaller ‘representative subgraph’ from the original large graph. Existing representative subgraph sampling algorithms either randomly select sets of nodes or edges, or they explore the vicinity of a randomly drawn node. All these existing approaches do not make use of topological properties of the original graph and provide good samples down to sample sizes of approximately 15% of the number of nodes in the original graph. In this article, we propose novel sampling methods for representative subgraph sampling, based on the Metropolis algorithm and Simulated Annealing. The key idea is to find a subgraph that preserves properties of the original graph that are efficient to compute or to approximate. In our experiments, we improve over the pioneering work of Leskovec and Faloutsos (KDD 2006), by producing representative subgraph samples that are both smaller and of higher quality than those produced by other methods from the literature.
منابع مشابه
FS3: A sampling based method for top-k frequent subgraph mining
Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we pr...
متن کاملBayesian Analysis of the Stochastic Switching Regression Model Using Markov Chain Monte Carlo Methods
This study develops Bayesian methods of estimating the parameters of the stochastic switching regression model. Markov Chain Monte Carlo methods data augmentation and Gibbs sampling are used to facilitate estimation of the posterior means. The main feature of these two methods is that the posterior means are estimated by the ergodic averages of samples drawn from conditional distributions which...
متن کاملMonte Carlo Methods and Bayesian Computation: MCMC
Markov chain Monte Carlo (MCMC) methods use computer simulation of Markov chains in the parameter space. The Markov chains are defined in such a way that the posterior distribution in the given statistical inference problem is the asymptotic distribution. This allows to use ergodic averages to approximate the desired posterior expectations. Several standard approaches to define such Markov chai...
متن کاملMarkov Chain Monte Carlo Methods : Computation and Inference
This chapter reviews the recent developments in Markov chain Monte Carlo simulation methods These methods, which are concerned with the simulation of high dimensional probability distributions, have gained enormous prominence and revolutionized Bayesian statistics The chapter provides background on the relevant Markov chain theory and provides detailed information on the theory and practice of ...
متن کاملMarkov chain Monte Carlo methods for Dirichlet process hierarchical model
Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorised into marginal and conditional methods. The former integrate out analytically the infinite-dimensional component of the hierarchical model and sample from the marginal distribution of the remaining variables using the Gibbs sampler. Conditional metho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008